High-definition (HD) semantic map generation of the environment is an essential component of autonomous driving. Existing methods have achieved good performance in this task by fusing different sensor modalities, such as LiDAR and camera. However, current works are based on raw data or network feature-level fusion and only consider short-range HD map generation, limiting their deployment to realistic autonomous driving applications. In this paper, we focus on the task of building the HD maps in both short ranges, i.e., within 30 m, and also predicting long-range HD maps up to 90 m, which is required by downstream path planning and control tasks to improve the smoothness and safety of autonomous driving. To this end, we propose a novel network named SuperFusion, exploiting the fusion of LiDAR and camera data at multiple levels. We benchmark our SuperFusion on the nuScenes dataset and a self-recorded dataset and show that it outperforms the state-of-the-art baseline methods with large margins. Furthermore, we propose a new metric to evaluate the long-range HD map prediction and apply the generated HD map to a downstream path planning task. The results show that by using the long-range HD maps predicted by our method, we can make better path planning for autonomous vehicles. The code will be available at https://github.com/haomo-ai/SuperFusion.
translated by 谷歌翻译
会话问题生成(CQG)是机器通过对话等人类(例如交互式阅读理解)的重要任务。与传统的单转交问题(SQG)相比,CQG更具挑战性的意义,即生成的问题不仅需要有意义,而且要与发生的对话历史保持一致。虽然先前的研究主要集中于如何建模对话的流量和对齐,但迄今为止,尚无对模型必需部分和历史的部分进行全面的研究。我们认为,缩短上下文和历史是至关重要的,因为它可以帮助该模型对对话的一致性进行更多优化。为此,我们提出了一个两阶段CQG框架COHS-CQG,该框架采用COHS模块来缩短输入的上下文和历史记录。特别是,COHS选择连续的句子,并根据其相关性得分通过顶级P策略转弯。我们的模型在答案感和答案环境中都可以在COQA上实现最先进的表演。
translated by 谷歌翻译
具有单个刚体模型的凸模型预测控制(MPC)在真实的腿部机器人上表现出强烈的性能。但是,凸MPC受其假设的限制,例如旋转角度和预定义的步态,从而限制了潜在溶液的丰富性。我们删除了这些假设,并使用单个刚体模型解决了完整的混合企业非凸编程。我们首先离线收集预处理问题的数据集,然后学习问题解决方案图以快速解决MPC的优化。如果可以找到温暖的启动,则可以接近全球最优性解决离线问题。通过根据初始条件产生各种步态和行为来测试所提出的控制器。硬件测试根据传感器反馈演示了在线步态生成和适应性超过50 Hz。
translated by 谷歌翻译
本文提出了动态系统的不确定性定量(UQ),这是一种基于物理信息的生成对抗网络(GAN)。流动流基地采用标准化流程模型作为发电机,以明确估计数据的可能性。对该流模型进行了训练,以最大程度地提高数据的可能性并生成可以欺骗卷积歧视者的合成数据。我们使用先前的物理信息(所谓的物理学深度学习(PIDL))进一步正规化了这一训练过程。据我们所知,我们是第一个为UQ问题提供流动,GAN和PIDL的集成的人。我们采用交通状态估计(TSE),旨在使用部分观察到的数据来估计流量变量(例如,交通密度和速度),以证明我们提出的模型的性能。我们进行数值实验,其中应用了所提出的模型来学习随机微分方程的解决方案。结果证明了所提出的模型的鲁棒性和准确性,以及学习机器学习替代模型的能力。我们还在现实世界数据集(NGSIM)上对其进行了测试,以证明所提出的流量流可以胜过基线,包括纯流程模型,物理信息信息流量模型和基于流量的GAN模型。
translated by 谷歌翻译
深度学习方法已被证明可以有效地表示量子多体系统的地面波函数。现有方法由于其图像样结构而使用卷积神经网络(CNN)进行方格。对于非方格晶格,现有方法使用图形神经网络(GNN),其中未精确捕获结构信息,从而需要其他手工制作的Sublattice编码。在这项工作中,我们提出了晶格卷积,其中使用一组建议的操作将非方格晶格转换为类似网格的增强晶格,可以在上进行定期卷积。根据提议的晶格卷积,我们设计了使用自我门控和注意机制的晶格卷积网络(LCN)。实验结果表明,我们的方法在PAR上的性能或比Spin 1/2 $ J_1 $ - $ J_2 $ HEISENBERG模型在Square,Honeycomb,Triangular和Kagome Lattices上的现有方法更好,而无需使用手工制作的编码。
translated by 谷歌翻译
最近几天,流媒体技术极大地促进了直播领域的发展。由于直播记录的长度过多,因此提取突出显示细分市场至关重要,以有效地生殖和重新分布。尽管事实证明,有很多方法可以有效地检测其他模式,但直播处理中存在的挑战,例如极端持续时间,大主题转移,无关紧要的信息等等,因此严重阻碍了这些这些的适应性和兼容性方法。在本文中,我们制定了一个新的任务直播突出显示检测,讨论和分析上面列出的困难,并提出了一种新的建筑抗议,以解决此问题。具体而言,我们首先将原始数据编码为多个视图,并对其时间关系进行建模,以捕获层次注意机制中的线索。之后,我们尝试将突出显示剪辑的检测转换为搜索最佳决策序列的搜索,并使用完全集成的表示形式来预测动态编程机制中的最终结果。此外,我们构建了一个完全注重的数据集Anthighlight,以实例化此任务并评估模型的性能。广泛的实验表明我们提出的方法的有效性和有效性。
translated by 谷歌翻译
对新生儿的运动和姿势评估使经验丰富的儿科医生可以预测神经发育障碍,从而可以早期干预相关疾病。但是,大多数用于人类姿势估计方法的最新AI方法都集中在成年人上,缺乏公开基准的婴儿姿势估计。在本文中,我们通过提出婴儿姿势数据集和深度聚合视觉变压器来填补这一空白,以进行人姿势估计,该姿势估计引入了一个快速训练的完整变压器框架,而无需使用卷积操作在早期阶段提取功能。它将变压器 + MLP概括为特征图内的高分辨率深层聚集,从而在不同视力级别之间实现信息融合。我们在可可姿势数据集上预先训练,并将其应用于新发布的大规模婴儿姿势估计数据集。结果表明,凝集可以有效地学习不同分辨率之间的多尺度特征,并显着提高婴儿姿势估计的性能。我们表明,在婴儿姿势估计数据集中,凝集优于混合模型hrformer和tokenpose。此外,在可可瓣姿势估计上,我们的凝集表现优于0.8 AP。我们的代码可在github.com/szar-lab/aggpose上获得。
translated by 谷歌翻译
Graph Neural Networks (GNNs) are powerful tools for graph representation learning. Despite their rapid development, GNNs also face some challenges, such as over-fitting, over-smoothing, and non-robustness. Previous works indicate that these problems can be alleviated by random dropping methods, which integrate augmented data into models by randomly masking parts of the input. However, some open problems of random dropping on GNNs remain to be solved. First, it is challenging to find a universal method that are suitable for all cases considering the divergence of different datasets and models. Second, augmented data introduced to GNNs causes the incomplete coverage of parameters and unstable training process. Third, there is no theoretical analysis on the effectiveness of random dropping methods on GNNs. In this paper, we propose a novel random dropping method called DropMessage, which performs dropping operations directly on the propagated messages during the message-passing process. More importantly, we find that DropMessage provides a unified framework for most existing random dropping methods, based on which we give theoretical analysis of their effectiveness. Furthermore, we elaborate the superiority of DropMessage: it stabilizes the training process by reducing sample variance; it keeps information diversity from the perspective of information theory, enabling it become a theoretical upper bound of other methods. To evaluate our proposed method, we conduct experiments that aims for multiple tasks on five public datasets and two industrial datasets with various backbone models. The experimental results show that DropMessage has the advantages of both effectiveness and generalization, and can significantly alleviate the problems mentioned above.
translated by 谷歌翻译
The click-through rate (CTR) prediction task is to predict whether a user will click on the recommended item. As mind-boggling amounts of data are produced online daily, accelerating CTR prediction model training is critical to ensuring an up-to-date model and reducing the training cost. One approach to increase the training speed is to apply large batch training. However, as shown in computer vision and natural language processing tasks, training with a large batch easily suffers from the loss of accuracy. Our experiments show that previous scaling rules fail in the training of CTR prediction neural networks. To tackle this problem, we first theoretically show that different frequencies of ids make it challenging to scale hyperparameters when scaling the batch size. To stabilize the training process in a large batch size setting, we develop the adaptive Column-wise Clipping (CowClip). It enables an easy and effective scaling rule for the embeddings, which keeps the learning rate unchanged and scales the L2 loss. We conduct extensive experiments with four CTR prediction networks on two real-world datasets and successfully scaled 128 times the original batch size without accuracy loss. In particular, for CTR prediction model DeepFM training on the Criteo dataset, our optimization framework enlarges the batch size from 1K to 128K with over 0.1% AUC improvement and reduces training time from 12 hours to 10 minutes on a single V100 GPU. Our code locates at https://github.com/bytedance/LargeBatchCTR.
translated by 谷歌翻译
本文提出了一个简单的基线框架,用于基于视频的2D/3D人姿势估计,该估计可以比现有作品实现10倍提高效率,而无需任何性能降级,名为Deciwatch。与当前在视频中估算每个帧的解决方案不同,Deciwatch引入了一个简单而有效的样品探测框架框架,该框架只能通过人类动作的连续性和轻巧的姿势表示,仅观看稀疏采样的框架。具体而言,DeciWatch均匀地示例少于10%的视频帧以进行详细估计,以有效的变压器体系结构来确定估计的2D/3D姿势,然后使用另一个基于变压器的网络准确地恢复其余帧。通过四个数据集的三个基于视频的人姿势估计和身体网格恢复任务的全面实验结果验证了Deciwatch的效率和有效性。代码可在https://github.com/cure-lab/deciwatch上找到。
translated by 谷歌翻译